Performance Oriented Prefetching Enhancements Using Commit Stalls
نویسندگان
چکیده
Loads that miss in L1 or L2 caches, and are waiting for their data at the head of the ROB, cause significant slow down in the form of commit stalls. We identify that most of these commit stalls are caused by a small set of loads, referred to as LIMCOS (Loads Incurring Majority of COmmit Stalls). We propose simple history-based classifiers that track commit stalls suffered by loads to help us identify this small set of loads. In this paper we study two prefetching enhancements enabled by classifiers. In the first enhancement, the classifiers are used to train the prefetcher to focus on the misses suffered by LIMCOS. This, referred to as focused prefetching, results in a 9.8% gain in IPC over naive GHB based delta correlation prefetcher along with a 20.3% reduction in memory traffic for a set of 17 memory-intensive SPEC2000 benchmarks. Another important impact of focused prefetching is a 61% improvement in the accuracy of prefetches. We demonstrate that the proposed classification criterion performs better than other existing criteria like criticality and delinquent loads Also we show that the criterion of focusing on commit stalls is robust enough across cache levels and can be applied to any prefetcher without any modifications to the prefetcher. We also demonstrate the positive impact that Focused Prefetching has in a multi-core scenario. In the case of global history based prefetchers, we demonstrate not only the applicability of focused prefetching, but also the second enhancement based on classifiers – filtering of prefetches once they are generated.
منابع مشابه
A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching (CMU-CS-96-174)
High-performance I/O systems depend on prefetching and caching in order to deliver good performance to applications. These two techniques have generally been considered in isolation, even though there are signi cant interactions between them; a block prefetched too early reduces the e ectiveness of the cache, while a block cached too long reduces the effectiveness of prefetching. In this paper ...
متن کاملEnergy-Constrained Prefetching Optimization in Embedded Applications
In energy-constrained settings, most low-power compiler optimization techniques take the approach of minimizing the energy consumption while meeting no performance loss. However, it is possible that the available energy budget is not sufficient to meet the optimal performance objective. To limit energy consumption within a given energy budget, energy-constrained optimization approach is more si...
متن کاملStreaming Prefetch
In most commercial processors, data prefetching has been disregarded as a potentially eeective solution to hide cache misses, multi-level caches being widely preferred. However, multi-level caches are mostly eeective at removing capacity and connict misses, while prefetching is particularly eecient for removing compulsory misses, especially in the regular accesses found in numerical codes. One ...
متن کاملExploring the limits of prefetching
We formulate a new approach for evaluating a prefetching algorithm. We first carry out a profiling run of a program to identify all of the misses and corresponding locations in the program where prefetches for the misses can be initiated. We then systematically control the number of misses that are prefetched, the timeliness of these prefetches, and the number of unused prefetches. We validate ...
متن کاملArchitecture for Cooperative Prefetching in P2P Video-on- Demand System
Most P2P VoD schemes focused on service architectures and overlays optimization without considering segments rarity and the performance of prefetching strategies. As a result, they cannot better support VCR oriented service in heterogeneous environment having clients using free VCR controls. Despite the remarkable popularity in VoD systems, there exist no prior work that studies the performanc...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Instruction-Level Parallelism
دوره 13 شماره
صفحات -
تاریخ انتشار 2011